Exploring approaches to tackle cross-domain challenges in brain medical image segmentation: a systematic review

Introduction Brain medical image segmentation is a critical task in medical image processing, playing a significant role in the prediction and diagnosis of diseases such as stroke, Alzheimer's disease, and brain tumors. However, substantial distribution discrepancies among datasets from different sources arise due to the large inter-site discrepancy among different scanners, imaging protocols, and populations. This leads to cross-domain problems in practical applications. In recent years, numerous studies have been conducted to address the cross-domain problem in brain image segmentation. Methods This review adheres to the standards of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) for data processing and analysis. We retrieved relevant papers from PubMed, Web of Science, and IEEE databases from January 2018 to December 2023, extracting information about the medical domain, imaging modalities, methods for addressing cross-domain issues, experimental designs, and datasets from the selected papers. Moreover, we compared the performance of methods in stroke lesion segmentation, white matter segmentation and brain tumor segmentation. Results A total of 71 studies were included and analyzed in this review. The methods for tackling the cross-domain problem include Transfer Learning, Normalization, Unsupervised Learning, Transformer models, and Convolutional Neural Networks (CNNs). On the ATLAS dataset, domain-adaptive methods showed an overall improvement of ~3 percent in stroke lesion segmentation tasks compared to non-adaptive methods. However, given the diversity of datasets and experimental methodologies in current studies based on the methods for white matter segmentation tasks in MICCAI 2017 and those for brain tumor segmentation tasks in BraTS, it is challenging to intuitively compare the strengths and weaknesses of these methods. Conclusion Although various techniques have been applied to address the cross-domain problem in brain image segmentation, there is currently a lack of unified dataset collections and experimental standards. For instance, many studies are still based on n-fold cross-validation, while methods directly based on cross-validation across sites or datasets are relatively scarce. Furthermore, due to the diverse types of medical images in the field of brain segmentation, it is not straightforward to make simple and intuitive comparisons of performance. These challenges need to be addressed in future research.


Introduction
Medical image segmentation, particularly for the brain, is a crucial and challenging task in the field of medical imaging analysis, with a wide range of applications from disease diagnosis to treatment planning.The complexity of this task is further compounded when considering the cross-domain nature of the data, arising from variations in scanners, imaging protocols, and patient populations among different sites (Dolz et al., 2018;Ravnik et al., 2018).This review aims to provide an overview of the progress made in the domain of cross-domain brain medical image segmentation.As depicted in Figure 1, the brain images and the corresponding segmented lesion areas are illustrated.
Domain-adaptive methods are designed to adapt a model that has been trained on one domain (the source domain) to perform well on a different, but related domain (the target domain).This is useful in situations where we have a lot of labeled data in the source domain but little to no labeled data in the target domain.Domain adaptation techniques attempt to learn the shift or differences between the source and target domains and adjust the model accordingly.Techniques can include featurelevel adaptation, instance-level adaptation, and parameter-level adaptation, among others.
Non-adaptive methods, on the other hand, do not make any adjustments to account for differences between the source and target domains.They are trained on one domain and then directly applied to another.This approach can work well if the source and target domains are very similar, but performance can degrade if there are significant differences between the two domains.Non-adaptive methods do not leverage any domain adaptation techniques and hence, can suffer from a problem known as domain shift or dataset shift, where the distribution of data in the target domain differs from the distribution in the source domain.
Transfer learning has emerged as a popular approach to leverage pre-trained models on new data, demonstrating success in various studies (Knight et al., 2018;Bermudez and Blaber, 2020;Zhou et al., 2022;Liu D. et al., 2023;Torbati et al., 2023).Unsupervised learning methods, which do not require labeled data from the target domain, have also shown promising results in crossdomain brain image segmentation (Atlason et al., 2019;Rao et al., 2022).Recently, self-supervised learning, where models are pretrained on auxiliary tasks before being fine-tuned on the main task, has been increasingly adopted (Ntiri et al., 2021;Liu et al., 2022a;Tomar et al., 2022).
Besides, different strategies have been proposed to handle specific challenges in cross-domain brain image segmentation.For instance, normalization techniques have been used to reduce the scanner-related variability (Ou et al., 2018;Goubran et al., 2020;Dinsdale et al., 2021).Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have been employed to generate synthetic images that share the same distribution as the target domain, thus improving the model's generalizability (Zhao et al., 2019;Cerri et al., 2021;Tomar et al., 2022).Model ensembling and federated learning approaches have also been explored to leverage the strengths of multiple models or to perform decentralized learning (Reiche et al., 2019).

FIGURE
An example of lesion segmentation in brain (Liew et al., ).
et al., 2022;Zhou et al., 2022;Liu D. et al., 2023;Yu et al., 2023b;Zhang et al., 2023).Despite the significant progress, cross-domain brain image segmentation remains a challenging problem.Future research directions may include the development of more robust and generalizable models, the exploration of novel domain adaptation techniques, and the incorporation of multimodal imaging data to improve segmentation performance.The studies reviewed herein provide valuable insights into these potential avenues for future advancement (Liu Y. et al., 2020;Jiang et al., 2021;Liu et al., 2022a;Rao et al., 2022;Torbati et al., 2023).

Materials and methods . Inclusion criteria and search terms
The search process for this study adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Moher et al., 2009) guidelines.In order to gather relevant research on cross-domain issues in brain medical image segmentation, we have designated three main categories of keywords: Medical Imaging, Segmentation, and Domain.Specific keywords for each category are shown in Table 1.It's worth noting that we use the Boolean operator "OR" to connect keywords within the same category, while "AND" is used to connect different categories.This way, we can construct complex search queries.Because the focus of the research is on cross-domain issues in brain medical image segmentation, these articles will be included in our review.

. Screening and selection process
We used three search engines for literature retrieval: PubMed, IEEE, and Web of Science, with the search time frame being from January 2018 to December 2023 for journal or conference articles.In compliance with the PRISMA guidelines, the first stage of the screening process is to merge duplicate articles from different search engines.In the second stage, we screen based on the title and abstract of the articles, discarding those not relevant to our discussion topic, such as those that do not include keywords like "brain medical imaging, " "segmentation, " or "domain" in the title and abstract.In the third stage, we filter out eligible articles .

Data extraction
From the screened articles, we extracted the following information: author names, publication year, dataset name, dataset size, parts included in the dataset, cross-domain type, solution method, and evaluation metrics.For more detailed information about solution method, please refer to Tables 2, 3.
Enhancements based upon the UNet model continue to represent a prevalent research direction in medical image segmentation.Subsequent models, such as 3D-CNN, exhibit commendable performance in many 3D data scenarios, albeit at the cost of requiring substantial computational resources.In comparison, newer network structures like Transformer are gradually gaining traction in the field of medical segmentation, and it is anticipated that a plethora of innovations will be spawned from this methodology.
Methods grounded in different learning types are somewhat niche in comparison.On the whole, the outcomes of unsupervised and semi-supervised learning methods are not as effective as their supervised counterparts.This discrepancy is likely attributable to the relatively smaller datasets available in the field of medical imaging, unlike the voluminous data present in natural language processing and computer vision.
Mathematically-based methods are currently often amalgamated with deep learning models to enhance their interpretability.This area of work is particularly meaningful and holds significant potential.
There is a broad spectrum of data preprocessing techniques available, including Generative Adversarial Networks (GANs), which can be employed for data augmentation to enhance data diversity.
The array of tools available for medical image segmentation is continually expanding, and the barriers to their utilization are concurrently lowering.
In addition to extracting key data from cross-domain research in the field of brain image segmentation, we have also conducted a focused comparative analysis of cross-domain algorithms for three important branches of brain image segmentation: stroke lesion segmentation, white matter segmentation and brain tumor segmentation.
Due to the variety of datasets employed in the selected articles, it is challenging to compare the merits and demerits of each algorithm on a holistic basis.To compare the effectiveness of these algorithms, it becomes necessary to delve into more specific areas of segmentation.The ATLAS, MICCAI 2017 and BraTS datasets, each employed five times, stand out as the most frequently used.They correspond respectively to stroke lesion segmentation, white matter segmentation and brain tumor segmentation.

Results
Figure 2 presents the PRISMA flow diagram for this task.The number of articles from the three databases (PubMed, IEEE, Web of Science) were 487, 332, and 890 respectively.An additional seven articles were identified through the references of confirmed papers.After merging duplicate studies, 1,286 articles were obtained.Following the title and abstract screening, 364 articles remained.Finally, after full-text review, 71 articles were included for publication.Table 4 documents the details of the finally collected articles.

. Year of publication
As illustrated in the Figure 3, the number of papers addressing cross-domain segmentation in brain imaging has been increasing annually from 2018 to the present, with a peak of 15 papers in 2021.This trend indicates that there are still many challenges to overcome in this field, affirming its status as an active area of research.
. /fnins. . .Datasets As can be seen from Table 4 and Figure 4, in the 71 articles reviewed, 41 utilized public datasets, encompassing 56 different types.Among these, from Figure 5, the most frequently used datasets were ATLAS, MICCAI 2017 and BraTS, only five times.The remaining datasets were used less, with the majority being used only once.Thus, within the field of brain image segmentation, many articles addressing cross-domain issues still rely on proprietary datasets, and those that do use public datasets draw from a wide variety.

. Disease or region
For a more specific analysis, we have included the disease type or brain region that is segmented' in our data extraction.This addition will enable us to gain a deeper understanding of which diseases are related to brain image segmentation and which regions require segmentation.This detailed approach will significantly contribute to our comprehensive review of cross-domain segmentation in brain medical imaging.Figure 6 shows the disease categories and regions extracted from the reviewed papers.

FIGURE
The PRISMA diagram detailing this systematic review.
Among them, whole-brain segmentation accounts for the largest proportion.

. Cross-domain type
Based on the data collected, we have identified several types of cross-domain variations present in the field of brain medical image segmentation in Figure 7.The most common type of variation is "multi-site, " with 37 articles addressing this particular challenge.This is followed by "multi-scanner, " which is the focus of 18 articles.Both "multi-center" and "multi-modal" variations were discussed in 10 and six articles each.These findings highlight the diverse range of cross-domain challenges encountered in the segmentation of brain medical images, underscoring the need for further research and method development in this area.

. Solution method
As show in Figure 8, in the landscape of cross-domain segmentation in brain medical imaging, a diverse range of techniques are employed.The most prevalent methods include UNet, CNN, 3D-CNN, and Transfer Learning, indicating a strong reliance on convolutional architectures and leveraging pre-existing models.Other techniques such as Normalization, Self-Supervised learning, and GANs are also being utilized, albeit less frequently.A handful of studies explore alternative approaches including Unsupervised learning, Data Augmentation, and Transformerbased methods.This diversity of methodologies underscores the complexity of the challenge and the ongoing innovation in the field.
Due to the diversity in datasets and experimental methods, it is not feasible to compare the performance of all algorithms.However, it is possible to compare the algorithms that have utilized the ATLAS, MICCAI 2017 and BraTS datasets.

. Stroke lesion segmentation . . Dataset
To begin with, we introduce the dataset used, ATLAS.The MR modality of the Anatomical Tracings of Lesions After Stroke (ATLAS) dataset is T1.It has two versions: ATLAS v1.2 (Liew et al., 2017)

. . Algorithms
Cross-domain algorithms, as the name suggests, are designed to generalize and perform well across multiple, diverse datasets.A notable example from 2023, the Fan-Net (Yu et al., 2023b), utilizes Fourier-based adaptive normalization for stroke lesion segmentation.In 2021, the Unlearning algorithm (Dinsdale et al., 2020) was proposed to unlearn dataset biases for MRI harmonization and confound removal.Similarly, SAN-Net (Yu et al., 2023a) in 2023 and RAM-DSIR (Zhou et al., 2022) in 2022 showcased learning generalization to unseen sites and generalizable medical image segmentation via random amplitude mixup, respectively.
On the other hand, for performance comparison, we have also selected some non-cross-domain algorithms that are optimized for specific tasks or datasets.For instance, U-Net (Ronneberger et al., 2015), proposed in 2015, is an early example of convolutional networks for biomedical image segmentation.In 2018, DeepLab v3+ (Chen et al., 2018) introduced atrous separable convolution for semantic image segmentation.More recently, in 2020, nnU-Net (Isensee et al., 2021) presented a self-configuring method for deep learning-based biomedical image segmentation.

. . Evaluation result
In the realm of cross-domain segmentation in brain medical imaging, specifically for stroke lesion segmentation, the performance of various methods demonstrates a compelling trend toward the adoption of cross-domain algorithms.
As can be seen from Table 6, Among the non-cross-domain algorithms, CLCI-Net exhibits the highest Dice and F1-score, demonstrating superior performance in segmentation accuracy.However, nnU-Net, despite having a slightly lower Dice score, presents the least Floating Point Operations Per Second (FLOPs), indicating a more efficient use of computational resources.
Shifting focus to cross-domain algorithms, SAN-Net outperforms the rest in all three performance metrics-Dice, Recall, and F1-score, highlighting its robustness in handling cross-domain segmentation tasks.Notably, RAM-DSIR, despite having the least number of parameters, delivers competitive results, suggesting an efficient model with less complexity.
In conclusion, while non-cross-domain algorithms such as CLCI-Net and nnU-Net exhibit commendable performance, cross-domain algorithms, particularly SAN-Net and RAM-DSIR, demonstrate superior performance and efficiency in stroke lesion segmentation.This underscores the potential and advantages of cross-domain approaches in this field, prompting further exploration and development in this direction.
In order to benchmark stroke lesion segmentation algorithms under non-domain adaptation scenarios, we refer to the dataset collated in this study (Malik et al., 2024).As shown in Table 7, eight stroke lesion segmentation algorithms from the ATLAS project were employed.Many of these algorithms achieved a Dice Similarity Coefficient (DSC) of up to 0.7, with the highestperforming algorithm, the seventh one, reaching 0.844.This significantly surpasses the maximum DSC of 0.597 achieved when conducting domain adaptation testing.Therefore, it is currently challenging for domain adaptation algorithms to achieve performance levels comparable to those of algorithms tested without domain adaptation, due to the necessity of conducting domain adaptation testing.
. White matter segmentation

. . Dataset
As shown in Table 8, the dataset MICCAI 2017 is derived from the WMH MICCAI 2017 challenge (Kuijf et al., 2019).This dataset encompasses MRI scans from multiple sites, including the University Medical Center Utrecht (UMC Utrecht), the National University Health System Singapore (NUHS Singapore), the VU University Medical Center Amsterdam (VU Amsterdam), and two undisclosed locations.
In total, 60 samples are utilized for training, while the testing set comprises 110 samples.The diversity and scale of this dataset allow us to evaluate the performance of our methods in a comprehensive and accurate manner.The training data can be downloaded at https://wmh.isi.uu.nl.

. . Algorithms
In the context of white matter medical imaging, several notable papers stand out.The Voxel-Wise Logistic Regression (VLR) (Knight et al., 2018) algorithm, introduced in 2018,

. . Evaluation result
Table 9 presents the results of five different methods, all of which focus on the cross-domain segmentation problem in white matter imaging.In the table, -means there is no valid data.However, it is important to note that, with the exception of the second and third methods, the experimental datasets and experimental procedures used in each method are distinct from each other.
For instance, the VLR method employed three datasets, which included seven sites, and performed a leave-one-out crossvalidation with respect to these sites.The SC U-net and MixDANN methods, on the other hand, only employed three sites from the MICCAI 2017 training data for cross-validation.The Ensemble U-net method used all of the training data from MICCAI 2017 for training and the test data for testing.Lastly, the TDA method utilized both the MICCAI 2017 and VH datasets, performing cross-validation between these datasets.In addition, VH is a private dataset.
Therefore, while there are numerous studies addressing the cross-domain problem in the field of white matter segmentation, direct comparisons between them are challenging.This is due to the variations in the experimental data and procedures used, even when the same dataset is utilized in different studies.The differences in experimental procedures are manifested in whether cross-validation is performed between sites or between datasets.Although it is challenging to make a direct comparison between each algorithm, an overall observation can be made in the field of white matter segmentation.Specifically, the Dice Similarity Coefficient (DSC) is above 0.7 when cross-validation is conducted between sites, while the DSC is only around 0.5 when cross-validation is carried out between datasets.This observation suggests that cross-validation between datasets is more challenging, yet it is also closer to real-world scenarios.
. Brain tumor segmentation

. . Algorithms
In 2021, a learnable Self-Attentive Spatial Adaptive Normalization (SASAN) (Tomar et al., 2021) method was introduced, utilizing adversarial training to address the domain gap in radiological images.In 2022, two algorithms were presented.One algorithm is grounded in a knowledge distillation scheme incorporating exponential mixup decay (EMD) (Liu et al., 2022b) to progressively acquire target-specific representations, while the other algorithm is the Unsupervised Domain Adaptation (UDA) method based on Self-Semantic Contour Adaptation (SSCA) (Liu et al., 2022a).In 2023, another UDA (Qin et al., 2023) method, based on semi-supervised learning, was proposed.Additionally, in the same year, the Multimodal Contrastive Domain Sharing (Multi-ConDoS) (Zhang et al., 2023) generative adversarial networks were introduced.

. . Evaluation result
As shown in Table 11, Whole, Core, and Enh represent the Dice Similarity Coefficient (DSC) for whole tumor, core tumor, and enhanced tumor, respectively.While all five articles conducted cross-domain studies on brain tumor segmentation using the BraTS datasets, each article employed different source and target domains.As a result, direct comparisons of algorithm performance across the experimental results are challenging.

Discussion
The field of brain medical image segmentation has seen significant advancements with the widespread application of deep learning technologies.However, the challenge of domain adaptation continues to be a crucial issue.In our review, we have  identified a variety of methods proposed to address this issue, including transfer learning, normalization, unsupervised learning, Transformer models, and convolutional neural networks, among others.Each of these methods has its strengths but also comes with certain limitations.Transfer learning is a common approach to addressing domain adaptation issues, with the main idea being to apply knowledge learned in one domain (source domain) to another domain (target domain).However, the effectiveness of this method is influenced by the distribution difference between the source and target domains.If the distribution difference is too large, the effectiveness of transfer learning may be compromised.
Normalization is another common method for addressing domain adaptation issues, with the main idea being to reduce the differences between different datasets by adjusting the brightness and contrast of images.However, this method may result in the loss of some important image information, thereby affecting the accuracy of segmentation results.
Unsupervised learning and Transformer models have also been used in some studies to address domain adaptation issues.The advantage of unsupervised learning is that it does not require labeled data, but its performance is usually not as good as supervised learning.The advantage of Transformer models is that they can handle long-distance dependencies, but they have a high computational complexity and require a large amount of computational resources.
Furthermore, we have observed that despite the application of various techniques to address domain adaptation issues in brain medical imaging, there currently exists a lack of unified dataset collections and experimental standards.
For instance, as illustrated in Figure 4, 42.3% of the papers only use private data, while 8.5% of the papers use both public and private data.As shown in Figure 7, even when public datasets are used, there is significant diversity amongst them.As indicated in Tables 9, 11, even when a single identical dataset is used, if the experimental data and methods differ, it remains challenging to make comparisons among various algorithms.Moreover, the vast majority of current algorithms are not open-source, making it nearly impossible to reproduce the algorithms in the papers and design similar experiments for comparison.
Consequently, this makes it difficult to compare the performance of different studies and accurately assess the effectiveness of new methods.Therefore, future research needs to further develop more effective domain adaptation methods and establish unified dataset collections and experimental standards.

FIGURE
FIGUREYear of publication of the reviewed papers.

FIGURE
FIGUREProportion of public or private.

FIGURE
FIGUREInformation of datasets used by the reviewed papers.

FIGURE
FIGUREThe disease type or brain region that is segmented.

FIGURE
FIGUREProportion of cross-domain types.

FIGURE
FIGURESolution method used for cross-domain.

Funding
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article.This study was supported by the project supported by the Special Fund of Advantageous and Characteristic Disciplines (Group) of Hubei Province and the Scientific Research Plan Project of Hubei Province Department of Education 2021.B 2021312.This work was partially funded by the Health Research Council of New Zealand's project 21/144, the MBIE Catalyst: Strategic Fund NZ-Singapore Data Science Re-search Programme UOAX2001, the Marsden Fund Project 22-UOA-120, and the Royal Society Catalyst: Seeding General Project 23-UOA-055-CSG.
TABLE Key features of solution method.
, released in 2018, includes 304 cases from 11 researchTABLE A summary of the data extracted from the reviewed papers.Liew et al., 2022), released in 2022, includes 12,71 cases.Although contains more data, its relatively recent release means that fewer articles have used it for cross-domain image segmentation to date.Therefore, we have chosen ATLAS v1.2 as our comparison dataset.As shown in Table 5, ATLAS v1.2 includes nine sites.
TABLE The nine source sites of the T -weighted MR images in experiment.
TABLE Stroke lesion segmentation algorithms that do not use cross-domain testing.
TABLEComparison of brain tumor segmentation method.havebeenproposed to tackle this issue, each possesses its own strengths and limitations.Future research needs to delve deeper into novel methods to enhance the performance of domain adaptation in brain medical image segmentation.Moreover, it is imperative to establish unified dataset collections and experimental standards for a more accurate evaluation of the performance of different methods.Only through this approach can we gain a better understanding of the strengths and weaknesses of various methods and develop more effective solutions.Finally, we anticipate further advancements in deep learning technologies to address the domain adaptation problem in brain medical image segmentation.This progress will improve the accuracy of medical image methods